Writing is a time-consuming process; writing high-quality publications requires attention to detail at every step of the way, from the actual prose on paper to its layout in the document to the presentation of figures. In this guide we walk you through 10 aspects of writing a scientific article using LaTeX to format your work and save you time. We emphasize typing commands at the unix command line in this guide as a way for you to peek under the hood of the LaTeX engine. This will allow you (the author!) power over the production of your own academic documents.1
This guide could be extremely long. There are many, many fantastic resources on typesetting. Here we have hand-selected 10 topics to help lower the barrier to a more efficient and higher quality paper writing workflow. Specifically we focus on
tex,
latex, pdflatex, xelatex,
lualatex, etcTo help people practice these commands we have
hands-on examples ready in a JupyterLab session,
through Binder. Here you can follow
along, processing documents in a terminal session. You can start this
environment here: .
To use LaTeX on your own computer, you will need to install it (we highly recommend following the links therein to TeX Live on each system).
A LaTeX document (or a .tex file) is a plain text document
that contains commands that guides the LaTeX processing program how to
create a beautiful pdf. These commands can be “markup” like
\textbf{this is bold} for bold text or
$\alpha + \beta \frac{1}{x^2}$ for math like \(\alpha + \beta \frac{1}{x^2}\) or commands
that tell LaTeX about document structure like
\section{Introduction} or even commands to identify a
bibliography like \bibliography{refs_example.bib}.
Once you have a plain text document with markup, you then process it
using a set of programs to create a publishable output like a
.pdf file. This figure shows an example of a LaTeX document
and highlights different parts of the document and their role.
The Structure of a LaTeX document
After processing that document (via, say, the command latexmk
-pdflatex example.tex, assuming that the document is called
example.tex), one can see a pdf file like the following
image:
The associated pdf document
What does example.tex look like when compiled to a pdf
document? Can you add a title or author? Can you make some text bold?2 You can
practice by following these steps (and similar ones) later sections:
1_structure in the JupyterLab
window that launches when you clink on launch binder from
the readme.md file in the associated
github repositoryTerminal icon in the JupyterLab
panelatexmk -pdflatex
example.tex and then looking at the pdf.You can also copy the github repository to your own local machine and launch the Terminal to see a Unix command prompt if you are using a Mac or Linux machine. Windows machine also offer a unix command prompt, but it is a bit more involved to install it.
tex, latex,
pdflatex, etcAlthough the most basic program that parses markup is called
latex, in current daily use, you will mostly find yourself
using pdflatex or even xelatex or maybe
lualatex.
When Donald
Knuth created this approach to making beautiful scientific
documents, he started with the tex program but Leslie Lamport
built latex by combining multiple tex commands
into fewer and simpler macros. Both originally created documents in
dvi or postscript format. Nowadays,
pdf files are the best way to make a document that looks
the same to all who want to view it on their screens or print it for
themselves.
Here is a list of the common programs that one might use to create a pdf file from a latex document:
tex: a program that typesets TeX directives or
macrospdftex: a program that generates a PDF (instead of
DVI)latex: a program that typesets a pile of LaTeX
directives and macrospdflatex: a program that generates a PDF from
LaTeXbibtex: a program to take bibliographic information
from a .aux file (created by a run of latex or
pdflatex etc.) and generates a bibliography.biber: a program like bibtex but with more database
management capabilities.xelatex: support for a wide variety of fonts and
characters (you can type xelatex example.tex after changing the
font to one that is installed on your system).lualatex: extends latex so that more programming can be
done within it (via Lua for more complicate document designs and
workflows. See here for more on
lualatex). TODO fix linkFor example, at the command prompt in the Terminal, you might type
pdflatex example.tex create an example.pdf
file (if you only do it once, the citation will show up as a
? and no bibliography will be printed).
Notice also:
pdflatex (or xelatex or
lualatex) takes several passes — it must be run more than
one time — if your document involves citations or other more complex
features (like cross-references, tables of contents, etc.).latexmk or latexrun automate
this process of multiple passes by a latex processing program and a
bibliography creation program.The following figure shows how it may require three runs of
pdflatex (plus a run of bibtex) to go from an
example.tex file to an example.pdf file:
From LaTeX to PDF commands
You can replace those multiple lines with a single call to
latexmk -pdflatex example.tex.
pdflatex) and PDF figures (or
PNG … more on this later) rather than DVI or PS format for sharing
generated documentsSee the directory 2_texflavors and the
readme.md file therein. Can you change the font and use
xelatex to make a pdf, say, trying latexmk -xelatex
example.tex?
A given scientific paper will require many files and often involves
many authors. For example, several .tex files (for
different sections), multiple figures (in the form of
.pdfs), and bibliographis (in .bib files) may
all be part of the paper. Organizing these files is a consistent fashion
will lead to a clear process when dealing with revisions at a later
date.
As a specific example a main.tex file might look like
this:
\documentclass{article}
\title{My Title}
\begin{document}
\maketitle
\input{abstract}
\input{intro}
\input{results}
...
\bibliography{mybib.bib}
\end{document}
But results.tex might look like this:
\section{Results}
Figure~\ref{fig:vaccine_by_pop} shows that opposition to vaccination peaks at a population of 100,000.
\begin{center}
\begin{figure}[!ht]
\includegraphics[width=.8\textwidth]{vaccine_by_pop.pdf}
\caption{Vaccination opposition by population}\label{fig:vaccine_by_pop}
\end{figure}
\end{center}
The number 100,000 and the figure
vaccine_by_pop.pdf are derived from the R file called
vaccine_by_pop.R. This R file relies on data that is
cleaned by vaccine_data_cleaning.py, in addition to data
that are downloaded, cleaned, and merged from the web.
So how do we organize the data, the files, and the overall workflow? There are many possibilities, but we’re reminded by a slice of the Zen of Python:
Simple is better than complex. Complex is better than complicated. Flat is better than nested.
We provide two specific examples of workflows below, first noting two aspects that will greatly improve your process. The first is to separate your data from your processing and presentation:
data1.csv, ...,
datan.csv)data_merged_filtered.db)temp_vs_time.csv)temp_vs_time.py)The second aspect, directly related to the LaTeX, is
to establish a predictable naming convention. For example, each output
like a table or figure uses one script with the same
name:temp_vs_time.pdf <—> temp_vs_time.py
and that LaTeX labelling follow this convention
\label{fig:temp_vs_time}. When editing the document, the
path from figure to the associated plotting script and related data is
then clear.
Here are a two examples of directory structures have have worked for us:
In this example, we use Matt West’s directory structure, where the versions of the paper are kept in their own directories:
paper_topic_name_dir_name | string used for repo, tex, and bib files
+ requirements.txt | number of pages, etc
+ 1_submitted_paper
| +-- paper_topic_name.tex
| +-- refs_topic_name.bib
| +-- journal_class.cls | any files needed for the journal latex style
| +-- figures
| | +-- temp_vs_time.pdf | descriptive names for figures (not fig1.pdf, etc)
| | +-- error_vs_stepsize.pdf
| | `-- ...
| +-- data | data files that generate the figures
| | +-- Makefile | Makefile that will re-generate all figures
| | +-- temp_vs_time.csv | use the same name as the resulting figure
| | +-- plot_temp_vs_time.py | plotting scripts, use names like plot_.py
| | `-- ...
| `-- submitted_paper_topic_name.pdf | actual PDF file submitted
+ 2_reviews
| +-- review_1.pdf | individual reviews
| +-- review_2.pdf
| `-- editor_statement.pdf | instructions and summary from editor
+ 3_response_to_reviews
| +-- response_topic_name.tex
| `-- sent_response_topic_name.pdf | actual PDF file sent to editor
` 4_revised_paper
+-- paper_topic_name_revised.tex
+-- refs_topic_name_revised.bib
+-- journal_class.cls | copy here any other files needed
+-- figures | copy here all the figures again
| +-- temp_vs_time.pdf | edit figures as needed
| +-- error_vs_stepsize.pdf
| `-- ...
+-- data | copy all data again and edit as needed
| `-- ...
`-- submitted_paper_topic_name_revised.pdf | actual PDF submitted
Reference: Matt West @ https://lagrange.mechse.illinois.edu/latex_quick_ref/
An alternative approach uses git branches for different versions, and
a single Makefile for all tasks (from turning the paper
into a pdf file via LaTeX, to creating figures, etc.). See also the
discussion in Bowers and Voors (2016),
section 3.
paper_topic_name_dir_name | string used for repo, tex, and bib files
+ Makefile | file that tracks file relationships
+-- Data | directory for data and data cleaning, merging work
+ README.md | file with instructions and explanations
+ merge_data.R |
+ orig_data.csv | original data set, not to be changed
+ merge_data.csv |
`-- ... |
+-- Analysis |
+ README.md |
+ linear_simulations.R | file that runs simulations and saves output
+ linear_simulations.rda | output from linear_simulations.R
`-- ... |
+-- Figures |
+ README.md |
+ linear_simulations_N100.R | file creating a figure
+ linear_simulations_N100.pdf | the figure from linear_simulations_N100.R
+ descriptives.R | file creating a table
+ descriptives.tex | the table in LaTeX format
`-- ... |
+-- Paper |
+ README.md |
+ main.tex | the main LaTeX file
+ abstract.tex | the abstract file
`-- ... |
+-- References |
+ big.bib | bibliography file
`-- ... |
Now is better than never.See the directory 3_workflows and the
readme.md file therein.
Often your writing is often interleaved with edits and contributions from co-authors. How do you track changes and version in your LaTeX document?
We strongly recommend git version control via github, either when working along on a document or when multiple authors are involved. We do not git describe it in-depth here, but instead offer the following high-level best practices.
What files should you track (in version control)?
.tex file!.bib file for your article./figures/*.pdf./data/*.py ,
./data/*.R./data/*.csvWhat should you not track (in version control)?
paper_randnoise.pdf*.log, *.bbl,
*.aux, etc.DS_Store or other garbage from your systemVersion control is invaluable as a collaboration tool, however it does require diligence when working with co-authors on a LaTeX document. We recommend the following recipe:
latexmk
myfile.tex -C) and recompile to verify there are
no errors.Fewer tools allow collaborators to edit plain text documents at the same time. We nearly always rely on asychronous collaboration, even if we have broken up a task and the whole team is working on it at the same time, even in the same room.
Overleaf is designed for this task. It compiles LaTeX and syncs with github. See also the online versions of LaTeX listed here.
There are other systems for editing plain text at the same time such as Teletype for Atom.
See the directory 4_git and the readme.md
file therein.
The overarching style of your document is often decided by the journal. With this in mind, it is best to typeset your document with the journal’s style file. The Society for Industrial and Applied Mathematics (SIAM) provides style files directly whereas others, e.g. American Mathematical Society journals, are included with your TeX distribution and available in CTAN. In any case, committing and not deviating from the expected format will accelerate your time-to-publication by not slowing down the copy editing at the journal. The style files will provide macros for author formats, custom figure environments, and almost certainly the preferred style for the bibilography. In addition, most journal provide a style guide that will detail the expectations on punctuation, hyphens, commas, etc.
See directory 5_style and readme.md for an
example.
You already know Hemingway’s famous quote: “the only kind of writing is re-writing”. However, you might not know about linters.
A linter is a program that analyzes your text (sometimes in realtime, as you write it). When your mis-spelled words are highlighted in your email client, you are seeing the results of a linter alerting you to improve your text. Linters are also used in programming — catching code errors before running the code, by alerting you to unmatched parentheses or missing semi-colons.
Other linters can look for issues with style. Consider the following terrible sentence:
More research is needed to fill the gap created in extant literature in order to impact policy with very important findings.
One linter, the write-good, highlights several potential problems:
col 16 error| [write-good] "is needed" may be passive voice [E]
col 71 error| [write-good] "in order to" is wordy or unneeded [E]
col 102 error| [write-good] "very" is a weasel word and can weaken meaning [E]
Of course, linters cannot do it all. We use them because they draw attention to sentences that may need work. Ultimately they (hopefully) help focus our attention on prose: re-writing the sentence without using a passive voice, without using “impact” as a verb (!), and with a stronger justification for research than to just fill a gap in the literature.
There are many fantastic tips and guides to improving your writing, from reading paragraphs and sentences out loud to “edit by ear” Becker (1986) to guides specific to academic writing: Gopen and Swan (1990) and Becker (1986). Here, we offer a few directions that improve your writing specifically in LaTeX:
.tex
document on-the-fly.% TODO, a
comment in the .tex file. You can find all places where you
have % TODO in your document using: grep
TODO paper_randnoise.texSee the directory 6_linting and the
readme.md file therein.
You will find that authors have their own macros, their own style in
the .tex document, and they’re own preferences when using
LaTeX. Here we offer general principles that can help improve your
overall LaTeX workflow:
\begin{align}
\langle u, v \rangle & = \langle f, v\rangle\\
& = G(v)
\end{align}
\begin{tabular}{lrllr}
\toprule
& \multicolumn{1}{c}{$n$}
& \multicolumn{1}{c}{$t$}
& \multicolumn{1}{c}{$\rho$}
& \multicolumn{1}{c}{$m$} \\
\midrule
experiment 1 & \num{ 19929} & 0.32 & 0.8 & 55 \\
experiment 2 & \num{ 7729292} & 0.78 & 0.7 & 85 \\
experiment 3 & \num{888173928} & 1.25 & 0.65 & 2 \\
\bottomrule
\end{tabular}
.tex file\newcommand{\Hcurl}{\vec{H}(\text{curl},\Omega)}
\renewcommand{\vec}[1]{\boldsymbol #1}
.tex source unreadable.booktabs: provides clean horizontal lines for tables
(avoid vertical lines), providing \toprule and
\bottomrule in the example above.siunitx: to format large numbers and notation,
providing \num in the example above. \begin{align} for everything, instead try
specific environments built for your purpose.equation is your base equation environment. Use this
unless you have multiple equations.align should be used for multiple equations that
require alignment.split is used for a single equation that
requires alignment when split.multline is used for a single equation where
no alignment is needed.subequations may be used around align to
retain a single equation numberingSee example.tex in 7_dos for examples of
use.
\label{fig:easy_figure_name}\begin{figure}[!ht]
\centering
\includegraphics{example.pdf}
\captions{A caption}\label{fig:example}
\end{figure}
\label{eq:useful_equation_name}\begin{equation}\label{eq:Axb}
A x = b
\end{equation}
\label{sec:i_can_remember_this_section_name}\label{tab:what_a_great_table_name}Central to TeX is an algorithm for placing and spacing figures and
text so that you don’t have to. Float environments (figure, table, etc)
should be attached to the paragraph of their first reference (more in
the next section). Avoid use of
\FloatBarrier, \newpage, \vspace,
\hspace, etc to muscle your own spacing. ```
.tex document readableSee the directory 7_dos and the readme.md
file therein.
The LaTeX system allows you to (1) insert citations in your text
using commands like \cite{ChOlSe_2021_lsrbm} which can turn
into [7], (Chaudhry et al., 2021),
[Ch21], or other citation styles within the text itself and
also (2) to print out your bibliography, formatted according to your
journal’s guidelines, using a single command in the LaTeX document like
\bibliography{mybib.bib}. Separating formatting from
information saves time: hundreds of citations will be printed
automatically in the correct format if desired including only the
sources you cited. If you decide that you no longer need a citation,
this will be removed from your bibliography automatically. Journals
often provide formatting guidelines in .bst files that can
be referred to in the \bibliographystyle{} command.
The program bibtex (or biber) reads
.aux files created by latex programs and creates a
.bbl file which is then read by the LaTeX program to format
everything (above we showed the need to run pdflatex,
bibtex, pdflatex, and pdflatex in
order to generate citations).
To use bibtex, you need a plain text file that is a
database with entries formatted in BibTeX format. For example, here is
one entry in the BibTeX file for this essay:
@article{ChOlSe_2021_lsrbm,
author = {Chaudhry, Jehanzeb H. and Olson, Luke N. and Sentz, Peter},
doi = {10.1137/20M1323552},
journal = {SIAM Journal on Scientific Computing},
number = {2},
pages = {A1081-A1107},
title = {A Least-Squares Finite Element Reduced Basis Method},
url = {https://doi.org/10.1137/20M1323552},
volume = {43},
year = {2021}
}
.bib entry.
Grab the full citation online at citation’s journal and/or Google
Scholar see
instructions here for getting BibTeX formatted entries from Google
Scholar{ } instead of “
“{ } also force capitalization: for example title
= {All about {Krylov} methods}.bib entries. This can generate warnings..bib file) once. (And you can use tools like
Zotero
and BibDesk to make
managing those collections of bibliographic information easier.)See the directory 8_citations and the
readme.md file therein.
Figures, tables, and math break up the text of a document and convey
information that can make or break the overall flow of your story. In
general, if a figure or table has been created using code, your project
should have a figure or table script:
linear_simulations_N100.R creates one figure
linear_simulations_N100.pdf. This figure creation file
might require as input another file with simulation results, and in turn
the simulation results creator file may need data; this dependency may
be described in a readme or Makefile. For
example in line 1 Data/clean_data.csv: Data/clean_data.R
Data/raw_data.csv means that the file
Data/clean_data.csv depends on Data/clean_data.R
Data/raw_data.csv (is created by the .R file and the
.csv file together). And line 2 is a command used to create
Data/clean_data.csv (in this case, the command is R
---file Data/clean_data.R.
Data/clean_data.csv: Data/clean_data.R Data/raw_data.csv
R ---file Data/clean_data.R
Analysis/linear_simulations.rda: Analysis/linear_simulations.R Data/clean_data.csv
R --file Analysis/linear_simulations.R
Figures/linear_simulations_N100.pdf: Figures/linear_simulations_N100.R Analysis/linear_simulations.rda
R --file Figures/linear_simulations_N100.R
In general figures, tables, and math should appear close to where they are discussed in the text.
Figures are central to the overall feel of your article. Here are a few general tips for working with LaTeX and figures:
\includegraphics to scale a figure will also
change the font sizes; you should attempt to generate unscaled figures.
extrafontrcparams here\includegraphics[]{} command. For
example, if we wanted to include a figure but scale it to 1/3 of the
width of the text (the area within the left and right margins), we would
use:\includegraphics[width=0.3\textwidth]{myfig.pdf}`
Figure~\ref{fig:vaccine_by_pop} shows that opposition to vaccination peaks at a population of 100,000.
%
\begin{figure}[!ht]
\centering
\includegraphics[width=.8\textwidth]{vaccine_by_pop.pdf}
\caption{Vaccination opposition by population}\label{fig:vaccine_by_pop}
\end{figure}
\begin{figure}[!ht] or
\begin{table}[!ht]
! tex will ignore area restrictionsh place it “here” if it fits in the areat place it at the “top” otherwise and if it fits
otherwise create a new pagextable package to
convert a matrix or data-frame to a LaTeX formatted table.Math fonts should work with the main font of the article. For examples of good math and text font pairings see the LaTeX Font Catalogue.
See the directory 9_figures and the
readme.md file therein. In particular, you will consider
the following “bad” figure and how to improve it in your LaTeX
document.
{ “width: 50%;
margin: auto; text-align: center;” }
\cref{} referencing for allA LaTeX document is a plain text file. This means that you can use
any text editor to write a LaTeX document. However, a text editor that
(1) recognizes that \textbf{} is a LaTeX command or that
(2) keeps track of matching braces and parentheses makes it easier to
write LaTeX markup. To that end, we use neovim (sometimes with the vimr gui) with vimtex plugins but we know
that there are many other approaches to typing a plain text document
using LaTeX markup.
We wrote this document using pandoc flavored markdown and turned it from plain text into HTML via the following command at the unix command line on our OS X laptops:
pandoc latex-guide.md --to html4 --from markdown+yaml_metadata_block+autolink_bare_uris+tex_math_single_backslash+inline_code_attributes --output latex-guide.html --self-contained --variable bs3=TRUE --standalone --section-divs --template latex-guide-template.html --include-in-header latex-guide-header.html --number-sections --table-of-contents --toc-depth=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --citeproc
Alternatively, if you have access to R, you can do the following to turn this markdown document into HTML.
Rscript -e "library(rmarkdown); render('latex-guide.md')"
We have decided to write this guide in a very opinionated way. And we emphasize the nitty gritty of technical document creation. If these opinions inspire a reader to write a 10 Things Guide on using Markdown or Google Docs please do write one! As an open-source document, we are also happy to receive pull requests for improvements to this guide.↩︎
Try out \title{Some Paper} and
\author{Some Person} in the preamble and
\maketitle just after the \begin{document}
line.↩︎